Skip to content

[refactor](be) Simplify exchange sink block sending#63574

Open
HappenLee wants to merge 2 commits into
apache:masterfrom
HappenLee:function
Open

[refactor](be) Simplify exchange sink block sending#63574
HappenLee wants to merge 2 commits into
apache:masterfrom
HappenLee:function

Conversation

@HappenLee
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: ExchangeSinkBuffer still carried a legacy _send_multi_blocks flag and mixed single-block and multi-block request assembly. This made the sending logic harder to reason about and left a compatibility branch in the sender. The change unifies request assembly around the blocks repeated field and uses exchange_multi_blocks_byte_size > 0 as the only switch; when the option is unset or non-positive, each RPC carries a single block.

Release note

None

Check List (For Author)

  • Test: No need to test (logic-only refactor; verified diff and request flow)

    • No need to test
  • Behavior changed: Yes (unset/non-positive exchange_multi_blocks_byte_size now always uses single-block sending)

  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: ExchangeSinkBuffer still carried a legacy _send_multi_blocks flag and mixed single-block and multi-block request assembly. This made the sending logic harder to reason about and left a compatibility branch in the sender. The change unifies request assembly around the blocks repeated field and uses exchange_multi_blocks_byte_size > 0 as the only switch; when the option is unset or non-positive, each RPC carries a single block.

### Release note

None

### Check List (For Author)

- Test: No need to test (logic-only refactor; verified diff and request flow)

    - No need to test

- Behavior changed: Yes (unset/non-positive exchange_multi_blocks_byte_size now always uses single-block sending)

- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@HappenLee
Copy link
Copy Markdown
Contributor Author

run buildall

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@HappenLee
Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31854 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3d43bbf377f9a2b07f43873fb138a5dae6735cd8, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17626	4070	4041	4041
q2	q3	10790	1419	819	819
q4	4687	481	347	347
q5	7615	2240	2158	2158
q6	245	175	138	138
q7	945	790	643	643
q8	9408	1719	1555	1555
q9	5150	4974	4992	4974
q10	6396	2221	1883	1883
q11	431	271	244	244
q12	627	422	295	295
q13	18114	3401	2789	2789
q14	266	263	238	238
q15	q16	823	786	712	712
q17	926	955	891	891
q18	6923	5929	5728	5728
q19	1350	1344	1113	1113
q20	632	447	301	301
q21	6332	2902	2661	2661
q22	568	386	324	324
Total cold run time: 99854 ms
Total hot run time: 31854 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4823	4738	4793	4738
q2	q3	5071	5267	4704	4704
q4	2210	2304	1440	1440
q5	5016	4731	4797	4731
q6	240	178	134	134
q7	1928	1722	1549	1549
q8	2461	2157	2092	2092
q9	7832	7354	7399	7354
q10	4766	4726	4221	4221
q11	542	392	370	370
q12	728	746	547	547
q13	3028	3416	2766	2766
q14	284	272	251	251
q15	q16	687	699	601	601
q17	1301	1270	1270	1270
q18	7514	6851	6892	6851
q19	1093	1092	1106	1092
q20	2239	2240	1965	1965
q21	5362	4659	4539	4539
q22	520	475	407	407
Total cold run time: 57645 ms
Total hot run time: 51622 ms

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking compatibility issue. The refactor makes new exchange senders always serialize data in the repeated blocks field, even when exchange_multi_blocks_byte_size is unset or non-positive, and removes the legacy block sender path. Current receiver code still has request->has_block() explicitly marked as the old compatibility path, so during a rolling upgrade a new BE sender can send blocks to an old BE receiver that only understands block; that receiver will ignore the payload and only process EOS, causing dropped exchange data.

Critical checkpoints: goal/refactor is clear, but not safely accomplished because the protocol fallback is removed; change is small and focused; concurrency/lifecycle are not materially changed; no new config is added, but existing config semantics now no longer choose the legacy wire field; FE/BE protocol compatibility is the blocking concern; no transaction/persistence/data-write path is involved; no new observability appears necessary for this refactor; tests are missing for mixed-version or fallback behavior; user focus: no additional focus points were provided.

for (auto& req : requests) {
if (req.block && !req.block->column_metas().empty()) {
auto add_block = brpc_request->add_blocks();
add_block->Swap(req.block.get());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops the legacy block wire path and always sends exchange payloads through blocks, even when _send_multi_blocks_byte_size <= 0 and only one block is sent. That breaks mixed-version BE compatibility: VDataStreamMgr::transmit_data() still treats request->has_block() as the old compatibility path, so an older receiver that does not handle field 13 will ignore this payload and only see EOS, losing exchange rows during rolling upgrade. Please keep using set_allocated_block() when multi-block exchange is disabled, or gate the repeated blocks field on a version/compatibility check.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172687 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3d43bbf377f9a2b07f43873fb138a5dae6735cd8, data reload: false

query5	4334	654	518	518
query6	344	219	197	197
query7	4233	577	328	328
query8	339	234	214	214
query9	8866	4117	4118	4117
query10	453	344	300	300
query11	5786	2480	2238	2238
query12	181	130	133	130
query13	1291	596	425	425
query14	6170	5457	5194	5194
query14_1	4502	4493	4463	4463
query15	214	205	186	186
query16	978	447	412	412
query17	945	737	600	600
query18	2441	503	348	348
query19	212	198	154	154
query20	138	130	128	128
query21	213	140	116	116
query22	13680	13645	13377	13377
query23	17306	16627	16236	16236
query23_1	16429	16497	16455	16455
query24	7522	1785	1324	1324
query24_1	1342	1320	1340	1320
query25	561	499	447	447
query26	1307	330	186	186
query27	2684	585	345	345
query28	4504	2023	2025	2023
query29	1009	628	521	521
query30	309	243	203	203
query31	1140	1096	954	954
query32	93	78	74	74
query33	557	367	312	312
query34	1203	1134	645	645
query35	772	803	708	708
query36	1419	1410	1244	1244
query37	154	107	92	92
query38	3221	3168	3112	3112
query39	941	917	893	893
query39_1	884	888	878	878
query40	239	154	132	132
query41	71	70	69	69
query42	111	113	109	109
query43	335	334	301	301
query44	
query45	219	210	205	205
query46	1074	1179	739	739
query47	2372	2417	2271	2271
query48	418	389	318	318
query49	656	515	395	395
query50	980	359	259	259
query51	4392	4339	4359	4339
query52	107	106	98	98
query53	260	285	210	210
query54	331	287	270	270
query55	95	94	90	90
query56	321	337	324	324
query57	1462	1428	1339	1339
query58	312	298	279	279
query59	1594	1711	1446	1446
query60	319	326	309	309
query61	158	155	161	155
query62	700	630	587	587
query63	236	201	206	201
query64	2380	806	659	659
query65	
query66	1739	479	360	360
query67	30315	30407	29958	29958
query68	
query69	471	346	307	307
query70	1039	1028	965	965
query71	303	268	264	264
query72	3030	2745	2455	2455
query73	839	762	414	414
query74	5128	5019	4806	4806
query75	2704	2613	2259	2259
query76	2303	1142	774	774
query77	392	421	344	344
query78	12497	12619	11790	11790
query79	1513	1088	766	766
query80	707	535	453	453
query81	469	292	250	250
query82	1371	157	127	127
query83	336	271	251	251
query84	317	141	111	111
query85	901	539	471	471
query86	391	328	332	328
query87	3444	3377	3234	3234
query88	3623	2721	2715	2715
query89	457	401	341	341
query90	1950	182	191	182
query91	180	178	141	141
query92	77	78	71	71
query93	1494	1368	860	860
query94	550	363	323	323
query95	695	477	359	359
query96	1050	770	356	356
query97	2766	2766	2616	2616
query98	235	228	241	228
query99	1191	1151	1031	1031
Total cold run time: 255107 ms
Total hot run time: 172687 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 32.14% (9/28) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.78% (20854/38777)
Line Coverage 37.36% (197546/528796)
Region Coverage 33.64% (154660/459702)
Branch Coverage 34.64% (67338/194382)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants